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Abstract 

A lower bound on the minimum error probability for multihypothesis testing is established. The bound, which 
is expressed in terms of the cumulative distribution function of the tilted posterior hypothesis distribution given 
the observation with tilting parameter > 1, generalizes an earlier bound due the Poor and Verdu (1995). A 
sufficient condition is established under which the new bound (minus a multiplicative factor) provides the exact 
error probability in the limit of 9 going to infinity. Examples illustrating the new bound are also provided. 

The application of this generalized Poor-Verdu bound to the channel reliability function is next carried out, 
resulting in two information-spectrum upper bounds. It is observed that, for a class of channels including the 
finite-input memoryless Gaussian channel, one of the bounds is tight and gives a multi-letter asymptotic expression 
for the reliability function, albeit its determination or calculation in single-letter form remains an open challenging 
problem. Numerical examples regarding the other bound are finally presented. 
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Hypothesis testing, probability of error, maximum-a-posteriori and maximum likelihood estimation, channel 
coding, channel reliability function, error exponent, binary-input additive white Gaussian noise channel. 
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I. Introduction 

In [12], Poor and Verdu establish a lower bound to the minimum error probability of multihypothesis 
testing. Specifically, given two random variables X and Y with joint distribution Px,y^ ^ taking values 
in a finite or countably-infinite alphabet X and Y taking values in an arbitrary alphabet y, they show that 
the optimal maximum-a-posteriori (MAP) estimation of X given Y results in the following lower bound 
on the probability of estimation error P^,: 

P,>il~a)Px,Y{ix,y)eXxy: Px\Yix\y) < a} (1) 

for each a E [0, 1], where Px\y denotes the posterior distribution of X given Y and the prior distribution 
Px is arbitrary (not necessarily uniform). This bound has pertinent information-theoretic applications such 
as in the proof of the converse part of the channel coding theorem that yield formulas for both capacity 
and e-capacity for general channels with memory (not necessarily information stable, stationary, etc) [14], 
[12]. It also improves upon previous lower bounds due to Shannon [13], [12, Eq. (7)] and to Verdu and 
Han [14], [12, Eq. (9)]. 

Furthermore, Poor and Verdu use the above bound to establish an information- spectrum based upper 
bound to the reliability function E*(R) - i.e., the optimal error exponent or the largest rate of asymptotic 
exponential decay of the error probability of channel codes [9], [5], [8], [15]- of general channels [12, 
Eq. (14)]. They conjecture that this bound, which is expressed in terms of a large-deviation rate function for 
the normalized channel information density (see Section ITV- Al for the definition), is tight (i.e., exactly equal 
to E*{R)) for all rates R. In [1], it is however shown via a counterexample involving the memoryless 
binary erasure channel (BEC) that the bound is not tight at low rates, and a slightly tighter bound is 
presented [1, Corollary 1]. 

In this work, we generalize the above Poor- Verdu lower bound in ([T]) for the minimum error probability 
of multihypothesis testing. The new bound is expressed in terms of the cdf of the tilted posterior distribution 
of X given Y with tilting parameter 9 > I, and it reduces to ([T]) when 9 = 1; see Theorem \T\ We also 
provide a sufficient condition under which our generalized Poor- Verdu bound, without the multiplicative 
factor (1 — a), is exact in the limit of 9 going to infinity. Specifically, the sufficient condition requires 
having a unique MAP estimate of X from Y almost surely in Py, where Py is the distribution of Y; see 
Theorem [21 We present a few examples to illustrate the results of Theorems \T\ and |2l 

We proceed by applying the above results to the reliability function E*{R) of general channels. We 
employ Theorem [T] to establish two information- spectrum upper bounds to E*{R); see Theorem |3l One 
upper bound, Ep^{R), is a function of the tilting parameter 9, while the other bound, Ep^^{R), involves 



taking the limit infimum of 9. It turns out that if the channel satisfies a symmetry condition, then both 
upper bounds can be expressed in terms of the information density of an auxiliary channel whose transition 
distribution is nothing but the tilted distribution of the original channel distribution; see Observation HI 

We next use Theorem [21 to show that for the memoryless finite-input additive white Gaussian noise 
(AWGN) channel, the upper bound Epy(R) is tight, hence yielding an information-spectral formula for 
this channel's reliability function: E*{R) = Epy{R) for all rates R between and channel capacity; 
see Theorem |4l The calculation or determination in closed (single-letter) form of Epy{R) is however 
a formidable task and remains a notoriously open problem, as it requires solving the optimization of 
a large-deviation rate function in additions to two limiting operations; this makes it quite difficult to 
compare -Epv(-R) to well-known lower/upper bounds to E*{R) (such as the random coding lower bound 
and the sphere packing upper bound [9], [Sjlj) for this AWGN channel. Nevertheless, the above multi-letter 
asymptotic expression for E*{R) may be conceptually useful for the future determination of E*(R) in 
computable single-letter form at low rates o We also note that the equality E*{R) = Epy{R) holds for a 
class of channels satisfying the sufficient condition of Theorem [2l see Corollary \T\ and Observation U\ 

Finally, we provide a lower bound to Epy (R) for the case of memoryless channels, which is computable 
for a given value of 9. We use this lower bound to demonstrate numerically that for the memoryless BSC, 
-Epv (-R) is not tight at all rates when 9 = 1 (which corresponds to the original Poor-Verdu reliability 
function upper bound). We also numerically show that for the memoryless Z-channel, Ep^{R) is not tight 
at high rates for all considered values of 9 (including large ones). 

The rest of the paper is organized as follows. In Section HH the generalized Poor-Verdu lower bound 
to the multihypothesis testing minimum error probability is established in terms of the tilted posterior 
distribution with parameter 9 (Theorem [T]). A sufficient condition under which an exact expression for 
the error probability is given in terms of an asymptotic (in 9) term of the bound (minus a multiplying 
factor) is also shown (Theorem [2l). Examples illustrating Theorems [U and [21 are provided in Section [nil In 
Section [IVl the two upper bounds, given by Epy{R) and Epy{R), respectively, for the channel reliability 
function are proved (Theorem [3l). Furthermore, it is noted that Epy{R) provides an exact asymptotic 
characterization for the channel reliability function at all rates for the finite-input AWGN channel as well 

'The sphere packing bound [9] is referred to as the space partitioning bound in [5]. 

^For the finite-input AWGN channel as well as the whole class of memoryless channels, E* (R) is already exactly determined in terms of a 
simple (single-letter) expression at high rates (beyond some critical rate) since the random coding and sphere-packing bounds coincide in that 
rate region [9]. Further improvements were recently established for the memoryless binary symmetric channel (BSC) and the continuous-input 
AWGN channel in [2], [3], where it is shown that E*(R) is also exactly determined for rates R in some interval directly below the critical 
rate. 



as other channels (Theorem |4] and Corollary [T]). Numerical examples involving the BSC and the Z-channel 
indicating the looseness of Epy{R) for specific choices of 6 are next provided. Finally, conclusions are 
stated in section |Vl Note that we will use the natural logarithm throughout. 

II. A GENERALIZED ERROR LOVV'ER BOUND FOR MULTIHYPOTHESIS TESTING 
We herein generalize the Poor-Verdu lower bound in ([T]) for the multihypothesis testing error probability. 

Consider two (correlated) random variables X and Y, where X has a discrete (i.e., finite or countably 
infinite) alphabet X = {xi,X2,X3, . . .} and Y takes on values in an arbitrary alphabet y. The minimum 
probability of error Pg in estimating X from Y is given by 

Pe = Pr [X ^ e{Y)] (2) 

where e{Y) is the MAP estimate defined as 

e(Y) = argmaxPx|y(a^|^)- (3) 

Theorem 1: The above minimum probability of error Pg in estimating X from Y satisfies the following 
inequality 

Pe > (1 - a)Px,Y {(x,y) eXxy-. P^^lix\y) < a} (4) 

for each a G [0, 1] and 9 > I, where for each y E y, 

is the tilted distribution of Px|y(-|y) with parameter 9 [6]. 

Note: When 9 = 1, the above bound in dH) reduces to the Poor-Verdu bound in ©. 

Proof: Fix 9 >1. We only provide the proof for a < 1 since the lower bound trivially holds when 
a = 1. 

From ^ and dS]), the minimum error probability Pg incurred in testing among the values of X satisfies 

l-Pe = Vi[X = e{Y)] 

Px\Y{e{y)\y) dPyiy) 
y 

( maxPx|y(x|y) ) dPyiy) 

y \x&X J 

i max f^{y)] dPyiy) 
y \xex J 



E 



max/^(F) 

xGPc 



where fx{y) — Px\Y{x\y). For a fixed y E y, let hj{y) be the j-th element in the set 

{fxi{y)Jx2{y)Jx,{y),---} 

such that its elements are listed in non-increasing order; i.e., 

hiiy) > hiy) > hiy) > ■ ■ ■ 

and 

{hi{y), h2{y), h^{y), . . .} = {fx^{y), fx2{y), fx^iy), ■ ■ ■}■ 

Then 

1 - P, = E[h{Y)]. (6) 

Furthermore, for each hj{y) above, define h- (y) such that h- (y) be the respective element for hj{y) 
satisfying 

hM = f^M = Px\Y{xj\y) ^ hf\y) = Pi^l{x,\y). 

Since hi{y) is the largest among {hj{y)}j>i, 

^(9), ^ K{y) ^ 1 

' ^^' E,>ih%y) i + E,>2[hM/hiiy)Y 

is non-decreasing in 9 for each y; this implies that 

hf\y)>hi{y) for ^> 1 and y g3^. (7) 

For any a e [0, 1), we can write 

Px,Y {{x,y) eXxy-. P^^l{x\y) > «} = / Px\y [x e X : P'ily{x\y) > a] dPy{y). 
Noting that 

Px\Y{xeX : Pf^y{x\y)>a] = J2Px\Y{x\y) ■ 1 [p^^l{x\y) > 



a 



xex 

oo 



where l(-) is the indicator function, yields 

Px,Y{i^^y)^X^y-- Px\Yi^\y)>c^} = I ff^/^,(2/)-l(/^f(2/)>«) UPy(y) 



> f h{y) ■ l{hf\y) > a)dPy{y) 

Jy 



> / hiy) ■ Hhiy) > a)dPYiy) 
Jy 

= E[h{Y)-l{h{Y)>a)], (8) 



where the second inequality follows from (|7]). To complete the proof, we next relate E[hi{Y) ■ l{hi{Y) > 
a)] with E[hi{Y)], which is exactly 1 — Pg- Invoking [12, eq. (19)], we have that for any a E [0, 1] and 
any random variable U with Pr{0<f/<1} = 1, the following inequality holds with probability one 

U <a + {l-a)-U-l{U > a). 

Thus 

E[U] < a + (1 - a)E[U ■ 1{U > a)]. 

Applying the above inequality to ^ by setting U = hi{Y), we obtain 

il-a)Px,Y{ix,y)^'^xy ■■ Px\Yix\y)>(^} > il-a)E[h,iY)-l{h^{Y)>a)] 

> E[hi{Y)]-a 

= (l-Pe)-a 

= (l-a)-Pe, 

where the first equality follows from ©. ■ 

We next show that if the MAP estimate e{Y) of X from Y is almost surely unique in ([3]), then the 
bound of Theorem [H without the (1 — a) factor, is tight in the limit of 9 going to infinity. 

Theorem 2: Consider two random variables X and Y , where X has a finite or countably infinite 
alphabet X = {xi,X2,X3, . . .} and Y has an arbitrary alphabet 3^. Assume that 

Px\Y{e{y)\y) > max Px|y(x|y) (9) 

holds almost surely in Py, where e{y) is the MAP estimate from y as defined in ([3]); in other words, the 
MAP estimate is almost surely unique in Py. Then, the error probability in the MAP estimation of X 
from Y satisfies 

Pe = lim Px,y\ix,y) eXxy-. P^^'Ux\y) < a} (10) 

for each a G (0, 1), where the tilted distribution PLy(-|y) is given in ^ for y G y. 

Proof: It can be easily verified from the definitions of hj{-) and h^ (■) that the following two limits 
hold for each y G 3^: 

(9)/ s 1 



lim h\ ' (y) 



where 

^{y) ^ max{j G N : h^{y) = h,{y)} (11) 



\im h,{y) ■ 1 (hf\y) > a 



and N = {1, 2, 3, . . .} is the set of positive integers, and 

/i,(y) ■ 1 (^ > a) forj = l,2,---,£(|/) 

for j > i{y) 

where l(-) is the indicator function. 

As a resuk, we obtain that for any a E [0,1), 

Urn Px,Y \{x,y) eXxy-. P^j^Ux\y) > a} 
= ^ Jim (f2 hAv) ■ 1 {hf\y) > a) j dPy{y) 

- X(|:m.)-i(^>«)Jc.p.(.), 

where (fT3]) follows from the Dominated Convergence Theorem [4, Thm. 16.4] since 



J2hAy)-^{hf\y)> 
Furthermore, (fT4l) holds since the limit (in 9) of 



a 






exists for every j = 1, 2, ■ ■ ■ by (fT2)) . hence implying (as shown in Appendix A) that 



oo oo 

lim > cte 1 = > lini ag ,■ . 



(12) 



(13) 
(14) 



Now condition ^ is equivalent to 

PT[e{Y) = l]^Py{yey:i{y) = l} = l; 

thus. 

Mm Px,Y \{x,y) e X xy : P^^hx\y)> a] = [ h,{y) ■ 1{1 > a) dPy{y) = E[h,{Y)] 

= 1-P., 



(15) 



(16) 



where (fT6l ) follows from (|6l). 



This immediately yields that for < a < 1, 

Pe = 1- lim Px,Y \ix,y)eXxy: P^^Ux\y) > a 
= \im Px,Y\{x,y)eXxy: P^^hx\y)<a}. 

8—^oo L ' J 



Observation 1: We first note that since the bound in (HJ) holds for every 9 > 1, it also holds in the 
limit of 9 going to infinity (the limit exists as shown in the above proof): 

Pe > (1 - a) lim Px,Y \ix,y)EXxy: P^^lix\y) < a] (17) 

for any < a < 1. 

Furthermore, if condition ^ does not hold (or equivalently from (fT5l) . if Pr[£(y) = 1] < 1), but there 
exists an integer L > 1 such that Pi[i{Y) < L] = 1, then using (fT4l) . we can write (fTTI) as 



P. > (1- 



fl- 



a] 



a] 



il-a] 



a -a] 



X(i:m.))<^p.(.)-X(|:m.)-i(^ 

1 oo \ 

5^/.,(y) -1 (!<«) + 5^ /.,(y)UPy(y) 



>a)\ dPriy) 



(18) 



y:eiv) = l \j^i 
2 



y:e{y)=2 



iz ^M ■ 1 (^ ^ «) + E ^^•(^)) dPy^y^ 



+ 



(y) 



(19) 



To render this lower bound as large as possible, its formula above indicates that although the multiplicative 
constant {I — a) favors a small a, the integration term in (fTSl ) actually has its smallest value when a is 
less than l/L (see (fT9l)). Therefore, a compromise in the choice of a has to be made in order to maximize 
the bound. 

in. Examples for the generalized Poor-Verdu bound 

In this section, we provide four examples (three of them with a finite observation alphabet and one 
with a continuous observation alphabet) to illustrate the results of the previous section. 
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A. Ternary Hypothesis Testing 

We revisit the ternary hypothesis testing example examined in [12, Figs. 1 and 2], where random 
variables X and Y have identical alphabets X = y = {0, 1,2}, X is uniformly distributed (Px{x) = 
1/3 Va; G X) and Y is related to X via 

\ — v\ — V2 ii y = X 

Vi if X = 1 and y = 

PY\xiy\x) = {v2 ii X = 2 andy = 

Vi ii y ^ X and y = 1 

V2 ii y ^ X and y = 2 

where we assume that 1 — t;i — t;2 > ^'2 > ^i > 0. In [12], vi = 0.27 and V2 = 0.33 are used. 

A direct calculation reveals that the MAP estimation function ([3]) for guessing X from Y is given by 
e{y) = y for every y E y, resulting in a probability of error of Pe = vi + V2 = 0.6 when vi = 0.27 and 
V2 = 0.33. Furthermore, we obtain that Pg is exactly determined via 

lim Px,Y \{x,y)eXxy: P^^U^ly) < a} = v, + V2 = Pe] 

as predicted by Theorem [2l since condition Q holds (since i{Y) = 1 almost surely in Py, where £{■) is 
defined in ([TT])). 

We next compute the new bound in (HI) for vi = 0.27, V2 = 0.33 and for different values of 6* > 1 and 
plot it in Fig. [H along with Fano's original bound (referred to as "Fano" in the figure) given by 

log3-/(X;F)-log2 



P.> 



log 2 



and Fano's weaker (but commonly used) bound 

/(X;y) + log2 



P, > 1 



log 3 



0.568348, 



0.358587 



shown in [12, Fig. 2] (referred to as "Weakened Fano" in the figure). The case of ^ = 1 corresponds 
to the original Poor-Verdu bound in ([U). As can be seen from the figure, bound & for 9 = 20 and 100 
improves upon ([T]) and both Fano bounds and approaches the exact probability of error as 9 is increased 
without bound (e.g., for 9 = 100 and a I 0, the bound is quite close to Pe). In Fig. |2l bounds & and 
(dl), maximized over a E [0, 1], are plotted versus 9. It is observed that when 9 > 16, bound @i improves 
upon O. 
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Fig. 1. Lower bounds on the minimum probability of error for Example IIII-AI bound (|4j versus a for 6 = 1, 20, 100 and Fano's original 
and weakened bounds. 
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Fig. 2. Lower bounds on the minimum probability of error for Example IIII-AI bounds ([T) and (O versus 6 optimized over 



B. Binary Erasure Channel 

Suppose that X and Y are respectively the channel input and output of a BEC with erasure probability 
e, where X = {0, 1} and y = {0, 1, E}. Let Pr[X = 0] = 1 - p and Pr[X = 1] = p with <p < 1/2. 
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Then, the MAP estimate of X from Y is given by 

y if ye {0,1} 



if y = E 



a] 



a) 



if < a < 

if 

if 



p" 



p9 _^ (^l —pY 

p" ^ „ (1 ^ pY 



pO _^ (^i — pY 

{i-pY 



py 
<a<l. 



pY 



(20) 



e{y) = 

and the resulting error probability is Pg = ^P- 
Calculating bound ^ of Theorem \T\ yields 

(1 - a)Px,Y {{x,y)eXxy: P^^Jy{x\y) < a] 


ep{l 

f'^^- -' " pe^^i_p)e 
Thus, taking ^ 'j' oo and then a | in (l20l) results in the exact error probability ep. Note that in this 
example, the original Poor-Verdii bound (i.e., with 6 = 1) also achieves the exact error probability ep by 
choosing a = l—p\ however this maximizing choice of a = 1—p for the original bound is a function of 
system's statistics (here, the input distribution p) which is undesirable. On the other hand, the generalized 
bound (lH) can herein achieve its peak by systematically taking 6 "[ oo and then letting a | 0. 

Furthermore, since in this example, £(y) = 1 for every y E {0, 1, E}, we have that ^ holds; hence, by 
Theorem d ^ yields 

Pe = lim Px,Y I {x, y) e X xy : P^'hx\y)< a] 
= ep for < a < 1, 

where the last equality follows directly from (|20l) without the (1 — a) factor. 



C. Multiple-Use BEC 

We now extend the previous example of the single-use BEC to the case of using the memoryless BEC n 
times with an input n-tuple X" = (Xi, ■ ■ ■ , X„) of independent and identically distributed (i.i.d.) random 
variables Xi with Fr[Xi = 1] = p, where < p < 1/2. Here again we determine the MAP estimation of 
X^ by observing the channel output F". For a received output ra-tuple y^. 



Pxn|yn(x"|y" 



(1 _ pYoE{x",y")pdi^{x^,y^) If doiix"", y"") = rfio(a;", y") = 
otherwise 



(21) 
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where do^ix"' , y'^) is the number of occurrences of {xj,yj) = (0,E) in (x",?/"), and the other rf-terms 
are defined similarly. The above equation indicates that for a given y", Px"|Y"(a^"|y") always peaks for 
dmix"', y") = since < p < 1/2. Thus the MAP estimator e(y") replaces all erasures in y" by O's while 
keeping the O's and I's in y" unchanged (e.g., if n = 5 and y" = (0, 0, E, E, 1), then e(y") = (0, 0, 0, 0, 1)). 
The resulting probability of error is given by 



i—n i— n \ / \ / 



fc=0 i=0 

= i-{i-epr 

where k is the number of erasures E in y" and i is the number of I's in y". 

On the other hand, we directly obtain from (|2TI) that condition ^ holds (or equivalently condition (fTSl) . 
i.e., ^(y") = 1 with probability one in Pyn). We can then apply Theorem [2] to obtain from (flOl ) that 

Pe = 1-il-epr 

= lim Pxr.,Y- |(x",i/") eXxy-. P%y^{x^\y^) < a} . 

We next consider the case of p = 1/2, i.e,. the input X" is uniformly distributed. In this case, (|2TI) 
yields that 

h{y-) = hiy-) = --- = h,.{y-) = 2-' 

and 

V+i(y") = /^2^+2(y") = ■ ■ • = h24yn = o 

where k is the number of erasures E in y". Thus £{y"') = 2^ and Theorem |2] no longer holds. Furthermore, 
hj (y") = hj{y^^) for every 6 > 1; this implies that for the uniform-input multiple-use BEC, the 
generalized bound (U) does not improve upon the original Poor-Verdu bound ([T]). 

D. Binary Input Observed in Gaussian Noise 

We herein consider an example with a continuous observation alphabet 3^ = M, where M is the set of 
real numbers. Specifically, let the observation be given hy Y = X + N, where X is uniformly distributed 
over X = { — 1,+1} and A^ is a zero-mean Gaussian random variable with variance cr^. Assuming that 
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X and A^ are independent from each other, then 



Px\Y{x\y) 



i 1 ^,.^ r iy-^) 

2 ^/2' 






2 v^2;^ P^ 2<t2 X+2 v/2^^ P^ 2^2 X 

exp{f} _ 1 



(22) 



exp{^} + exp{-^} l + exp{-5} 
for X E {— 1,+1}, y E M. This directly implies that the MAP estimate of X from Y is given by 
^(y) = +1 if y > and e{y) = — 1 if y < 0. The resulting error probability is Pg = *5(— 1/^)' where 
$(2;) = -i= J^^ exp — yrft is the cdf of the standard (zero-mean unit-variance) Gaussian distribution. 
Furthermore, since x E { — 1,+1}, we have 

exp{f|} 



"'"™ ^ expl^l Y , / exp(^) y l + exp{-^}' 

l^expi^l+cxpi-^}^ + \^cxp{5}+exp{-5}^ 

and the generalized Poor-Verdu bound (H)) yields 

Pe > {l-a)Px,Y{{x,y)eXxy : P^^l{x\y)<a'} 

jy + i'' 



'l-a)Px{-l) I -l^^^^p\--^^^^}dy 



1+cxp 



+ (l-«)Px(l)/ ^=^^^P 



l-|-GXp 



F^" 



2(t2 



dy 



:i-")r 1 ,,JJi±^\,iy 



2 ./=2,„j(i_,) v^l^ I 2^2 



= (l-a)*(-^log(i-l)-i). (23) 

Now taking the limits 9 ^ 00 followed by a J, for the right-hand side term in (l23l) yields exactly 
$ (— ^) = Pe; hence the generalized Poor-Verdii bound (HI) is asymptotically tight. The bound is illustrated 
in Fig. [3] for a = 0.429858 which gives P^ = 0.01. It can be seen that for 9 = 100 and a i 0, bound © 
is quite close to Pg. Finally note that (|22l) directly ascertains that condition ^ of Theorem [2] holds; thus 
Pe is given by (fTOl) . 
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Fig. 3. Example UTTdI bound iHl versus a for 61 = 1, 10, 100; a = 0.429858 and Pe = 0.01. 

IV. Channel reliability function 
We next use the results of Section |Il] to study the channel reliability function. 



A. Preliminaries 

Consider an arbitrary input process X defined by a sequence of finite-dimensional distributions [14], 
[10] 

x4{x" = (x;"),---,xw)}j^^. 

Denote by 

the corresponding output process induced by X via a general channel with memory 

which is an arbitrary sequence of ri-dimensional conditional distributions from A"" to 3^", where X and 
3^ are the input and output alphabets, respectively. 

We assume throughout this section that X is finite and that y is arbitrary. Note though that for the 
sake of clarity, we adopt the notations of a discrete probability space for y with the usual caveats (such 
as replacing summations with integrals and working with the appropriate probability measures, e.g., see 
[10, Remark 3.2.1]). 
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Definition 1 (Channel block code): An {n, M) code ^„ for channel W with input alphabet X and 
output alphabet 3^ is a pair of maps (/, g), where 

/:{1,2,---,M}^A'" 

is the encoding function yielding codewords /(I), /(2), • • ■ , /(M) G A"", each of length n, and 

g-.y^ ^ {1,2,- ■■ ,M} 

is the decoding function. The set of the M codewords is called the codebook and we also usually write 
^„ = {/(I), /(2), • ■ ■ , /(M)} to list the codewords. 

The set {1,2,...,M} is called the message set and we assume that a message V is drawn from 
the message set according to the uniform distribution. To convey message V over channel W , its 
corresponding codeword X" = f{V) is sent over the channel. Then F" is received at the channel output 
and V = g{Y^) is yielded as the message estimate. 

The code's average error probability (or average probability of decoding error) is given by 

M 

Since message V is uniformly distributed over {1,2,..., M}, we have that Pe(^„) = Pr[V^ ^ V]. 

Definition 2 (Channel reliability function [12]): For any R> 0, define the channel reliability function 
E*(R) for a channel W as the largest scalar (3 > such that there exists a sequence of ^„ = (n, M„) 
codes withe 

/3<liminf log Pe(^n) 

n— 5>oo 77, 

and 

i?< liminf-logM„. (24) 

Observation 2: We have adopted the above definition of channel reliability function from [12] for the 
sake of consistency. Note that this definition is not exactly identical to the traditional definition of the 
channel reliability function. If Pe^minin, R) denotes the probability of error of the best (n, [2"^]) code 
(i.e., the code with smallest error probability) for channel W, then the channel's reliability function is 

^ When no /? > satisfies the definition, we simply set E*{R) — 0. 
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traditionally defined as l 



E{R) = liminf logPe,mm(^,-R)- 

71— >-oo n 



However, the following relation can be shown between E*{R) and E{R): 

E{R) > E*{R) > \imE{R + 6). 

Thus the above two definitions are equivalent except possibly for discontinuity rate points (of which there 
are at most countably many as E*{R) and E{R) are non-increasing in R). 

Definition 3 ([14]): Given that F" is the output of channel W^ = Py"\X" due to input X" with 
distribution Px", the channel information density is defined as 

zx- W" a; ; y" = log , . = log ^^ ' (25) 

for (x", y") G A"" X 3^". 

Definition 4: Fix i? > 0. For an input X and a channel W, 

Tix{R) = liminf--logPx"W/"|(a:",2/") G A-^xJ^": iix"VK"(a^";y")<i?l (26) 

n-s>oo n y n J 

is called a large-deviation rate function for the normalized information density -ix"W"{'^ ")• 

Proposition 1 (Poor-Verdu upper bound to E*{R)): For a given channel W, its reliability function 
E*{R) satisfies [12, Eq. (14)], [1, Theorem 1] 

E*(R)< sup TTj^iR) ill) 

X 

for any i? > 0, where iix{R) is the large-deviation rate function for -ix"W"{-^ ■) as defined in (|26l) . 

Furthermore, the bound in (ITTI) can be slightly tightened by restricting the supremum operation over a 
smaller set of inputs [1, Corollary 1]: 



E*{R)<Epy{R)= sup iTx{R), (28) 

XeQ{R) 



for any R> 0, where 



Q{R) = < X : Each X" in X is uniformly distributed over its support S{X"] 



and P < hm inf - log |5(X") 1 1 . (29) 

n-s>oo 77, J 

''The limit supremum is also commonly used instead of the limit infimum in the definition of E{R), e.g., see [9, p. 160]. We could have 
also used the limit supremum in the inequality on /3 in Definition |2j in that case the results of this section would still hold by replacing 
lim inf „ with lim sup^ in Theorems [3] and |4] and Corollary [T] 
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B. Upper Bounds for the Channel Reliability Function 

Using Theorem [H we provide a lower bound for the probability of decoding error of any [n, M) channel 
code and establish two information- spectrum upper bounds for the channel reliability function. 

Theorem 3: Every ^„ = (n, M) code for channel W has its probability of decoding error satisfying 

Pe(^n) > (1 - a) Px«w« {(x", y") G A'" X 3^" : jf .^„(a;"; y") < log(Ma)} (30) 

for every a E [0, 1] and 9 > I, where channel input X" places probability mass 1/M on each codeword 
of rCn and 

pd /''//"■It-"'! 

■(6>) / n. n\A] ^Y"\X"yy 1"^ J «^ 

Jx'^W'^y^ 1 2/ ) — ^Og TD {-n^JDd { n\-n\' ^ ^ 

Furthermore, the channel's reliability function satisfies 

E* (R) < sup lim inf -- log Px^w^ ( (x", y") G A"" x 3^" : -j5?V'^ (^"i 2/") < ^1 

= El^^iR) (32) 

for every i? > and 9 > I, and 

E*(i?) < sup liminf lim --log Px"VF" I (^",y")eA'"x j;'^: -J (',l^„(x";y") <i?j 
xgq{r) "~^°° ^^°° n [^ n J 

- ^pv(i?) (33) 



for every i? > 0, where the set Q{R) is given in (|29l) . 

Proof: When the channel input X" is uniformly distributed over the code ^„ ^ '^^ of size M, the 
tilted distribution -PJ^^iiyn of Theorem [T] becomes 

pd (rrn\„n\ 



Pl„(a;")P^„l^„(l/"|x")/P^(y-) 



p^r.\x^mx^)/M 



(34) 
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for all x"' G ^n- Hence inequality <|30l ) follows directly from Theorem \T\ and ( |34l ). We next prove (|33] ); 
the proof of (l32l) is identical by omitting the limit over 6. Setting a = e""''' in (|30l ) yields 

--logPe(^„) < --log(l-e-"^) 
n n ' 

--logPxr^wr^ ((x",|/") e A"" X 3^" : -j|lvF"(a;";i/") < -logM-7 
n y n n 

which implies in light of ( fTTj ) 

1 1 r 

liminf logPe(^n) < liminflim logP^-H/^Ua:"; y") G A"" x 3^" : 

n-5>oo n n-5>oo e-^oo 77- [^ 

n n 

We can then conclude by definition of the channel reliability function that 

E*{R) = sup liminf -- log Pe(^n) 

{-e;,=5(X")}„>i:Xes(fl) "-^°° n 

< sup hm inf lim -- log Px^h/" | (x"; y") G A"" x 3^" : 

n n 

When considering only the sequence of codes in Q{R), we can replace - log |5(X")| — 7 by P (if 7 is 
chosen to be small enough such that R < liminf„_>.oo - log |iS(X")| — 7 is valid for the considered input 
X) as such a replacement can only (ultimately) increase the upper bound; we thus obtain 



E*(R) < sup liminf lim hgPx^w- < (x"; y") G X" x 3^" : -j),'V"(a;"; vl <Ry 



Observation 3: When 6 = 1, Jxnw^i^""^ V^) '^^ GD reduces to 

which is the channel information density as defined in (|25l) . 

In this case, the generalized upper bound for the channel reliability function Ppy (P) of (|32|) reduces 
to the Poor-Verdu upper bound Ppv(P) of (l28l) (as expected, since for = 1, dH) reduces to ([T])). 

Observation 4: Note that when 6* > 1, the denominator of the fraction in (|3T1) (in other words, 
^.„g^„ Px"(a;")Py„|j5^„(i/"|x")) is not a legitimate distribution since it does not sum to one over y" G 3^". 
However, if 

^ p^^i^^yix'^) = Y. Py-\xAir\x'') vx^rGA'^n = l,2,■••, os) 
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(0) 

then jjfnv(/n(x";?/") can be reformulated as follows 

Y"\X"\y I / 



.(6») / n n\ 1 l^y"-ey"'^Y"\X"yy l"^ 

jx-w-i^ '^y) = log 






V P^ ('?';'^lr"'l 



log jg-^ (36) 



where for each y" G 3^", 



(9) ^„,nun^ A -Pyn|X"(y"F") 

V P^ I'77'^It") -^ ^ ^'• 

(9) 






V P^ (v^lr"-) 

is the tilted distribution with parameter 6 of the channel statistics Pyn|xn(-|x"). Note that -Pyn^ij^n is a 

rl^fin^rl in T'Vii^irxf^m II h Acq f/^iciilt fVif^ -nf^\'\r rl^nrxminatrM* r\€ tVii=» 

x|y 



legitimate distribution (like -Piiy defined in Theorem [T]). As a result, the new denominator of the fraction 



in (l36l) (i.e., XIx^ga"" -^^"(^")-^y"|X"(2/"l^")) 1^ ^ '■^^^ distribution on 3^"; it is indeed the distribution 
of the output due to an input with distribution Px" sent over a channel with (legitimate) tilted statistics 
Py„l^„. We thus conclude that for channels satisfying the invariance condition of (l35l) . the upper bounds 



for the channel reliability function in (1321) and (1331) are actually based on the channel information density 
^X"iy"(^"' y"') of ^^ auxiliary channel whose transition probability Py^ij^^n is the tilted counterpart of the 
original channel transition probability Pyn|X"- 

When the output alphabet is finite, the channel W satisfies (l35l) if it is row-symmetric, i.e., if the rows 
of its transition matrix [p^^yn] of size \X"-\ x |3^"|, where Px^-y^^ — Py^ix^iv'^lx"'), are permutations of 
each other for each n. Note that channels whose transition matrix [jtajuj^n] is symmetric in the Gallager 
sense [9, p. 94] for each n are row- symmetric; such channels include the memoryless BSC and BEC. 

When the output alphabet is continuous (i.e., with 3^ = M) and the channel is described by a sequence of 
ra-dimensional transition (conditional) probability density functions (pdfs) fy^ix", the invariance condition 
of (l35l) translates into 

/ f^u\x4r\xndm---dyn= f f^^\x4r\£ndyi---dyn (37) 

V x", x" E X^, n = 1, 2, ■ ■ ■ . The memoryless finite-input AWGN channel and the memoryless binary- 
input (with X = { — 1, +1}) output-symmetric channel, i.e., whose transition pdf satisfies frixivl — 1) = 
/y|x(-2/| + l)Vi/GM, fulfill dlTl). 
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Observation 5: It can be shown along similar lines as the proof of [1, Theorem 1] that one can 
interchange the supremum and limit infimum (over n) in Epy{R) and Epy{R) and obtain 



where 



lim^i.'^(i? + 7) < 4v(^) < :^Pv(^) and \im Ep^{R + 7) < Epy{R) < Ep^{R), (38) 

74,0 74,0 



Ep^{R) 4 liminf sup lim -- log P^-VK- <! {x'', 2/") G A"" x 3^" : 



^jiV(^";2/")<^ 



lW/'D\ A Kv^ivif o„r^ ^Al^rr P J l'^" ^,'"^ C 1^" V ^7" ■ _ A^) 



E^^iR) ^ liminf sup log Px-h/- <^ (x", y") E X"" x y^ : -jf {x^- y^) < R 



and 



Q„(P) ^ (x" : Pxn(x") = — 1— forx" G 5(X'^) and P < - log |5(X") 

The new expressions that take the supremum over Qn{R) before letting n approaching infinity provide an 
alternative possibility for the evaluation of the two bounds. In particular, Qn{R) becomes a finite set as 
the input alphabet is finite; hence, taking the supremum over Qn{R) can be replaced with a maximization 
operation. Inequality (l38l) nevertheless implies that Epy{R) = Ppy(P) and Ppv(P) = Ppv(-R) almost 
everywhere in P (since these functions are non-increasing in P). 

C. Information-Spectral Characterization of the Reliability Function for a Class of Channels 

We next employ Theorem [2l to show that the upper bound in (|33] ) is tight for the memoryless finite- 
input AWGN channel as well as a larger class of channels, hence providing an information-spectral 
characterization for the reliability function of these channels. This exact expression E*(R) = Ppv(P) 
holds for all rates P (below channel capacity), albeit its determination in single-letter form (i.e., solving 
the optimization of a large-deviation rate function) remains a challenging open problem. 

We first focus on the Gaussian channel and then present the result for a wider class of channels. 
Consider a finite-input AWGN channel described by Fj = Xi + Zi, i = 1, 2, ■ ■ ■ , where Xj, Yi and Zi are 
the channel's input, output and noise at time i, respectively. We assume that the noise process Z is i.i.d. 
with each Z^ being a zero-mean Gaussian random variable with variance a^ > 0. We also assume that 
the noise and input processes are independent from each other. 
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Theorem 4: The channel reliability function E* (R) of the above finite-input AWGN channel satisfies 

E*{R) = Epy{R) 

= sup liminf lim -ilogPx"H/"((a:",2/") e ^'"xj;": ijjfi^„(x";2/")<i? 
x&Q(R) "■^°° 6*^00 n [_ n 

= sup liminf lim --logPx"VK"|(x",y") e A^^xj;": -2^I^„(x";y") <i? 

for any < R < C, where C denotes the channel's capacity, and Jxnvi/n(a;", y") and «x"iy"(^"' ^") ^^^ 
given in (|3TI) and (l36l) . respectively. 

Proof: Fix < i? < C. Let its channel input X" be uniformly distributed over a codebook ^„ C X"^ 
and let F" be the corresponding channel output. Then, for x" G ^„, 



Px"|y"(a;"||/' 



1 1 r i|i/"-x"ii2 

exp 



where || ■ || denotes the Euclidean norm. For a given y" received at the channel output, if ^(y") as defined 
in (fTTI) is greater than or equal to 2, then there exist distinct codewords x" and x" in ^„ such that 



n 1 '^ 

^ equivalently ^{xi - Xi)yi = - ^(x- 



ly^ - x"||2 = lly" - x"||^ equivalently ^(x^ - x,)z/i = - J^^x^i - x^); 



hence such y" belongs to an (affine) hyperplane in M". In other words, we have that 

{y^ e M" : i{yn > 2} C 3;(^„), 
where 

3;(^„) A l^n ^ ^n . ll^n _ ^n||2 _ ||^n _ ~n||2 f^^. ^^^^^ ^n^ ~n ^ ^^ ^^^^ ^n _^ ~n| 

consists of the union of (' ^"') hyperplanes in M". But as the Lebesgue measure of every hyperplane 
in M" is zero (since its volume is zero), we then obtain that the above finite union of hyperplanes has 
Lebesgue measure zero. Thus, Py^iyi^n)} = which directly yields that Pr[£(F") > 2] = 0, and hence 
Pr[£(r") = 1] = 1. Theorem [2] then implies that 

Pe(^n) = lim Px^w- |(x",y") eX'^xy^: j^Ih^^x"; y") < logM + log«| 

for a G [0, 1). As a result, with a = e^"'^ for arbitrarily small 7 > 0, 

liminf logPe(-e„) 

= liminf lim --logPx^w^ ((x",2/") G A"" x y^ : -ji'!vF"(^"; 2/") < -log|^„| -7J , 
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where Jxnw^i^^^V^') ^^ ^^ defined in (|3TI) . As stated in the proof of Theorem [3l the channel input that 
achieves the channel reliability should has the chosen 7 and supports satisfying lim inf „^oo - log \S{X^) \ — 
7 strictly larger but arbitrarily close to R. This concludes to 

E*{R) = sup liminf logPe(^n) 

{^,=5(X")}„>i:XeQ(fl) "^«= n 



= sup liminf lim log Py-VF- < (a;", y") G A'" x y" : -j'">{x''; y") < i? 

xeQiR) "^°° ^^°° n 1^ n 

Furthermore, since this channel satisfies (l37l) . we can replace Jxnvi/"(a^"5 2/") with ^^^^^^(x"; y"-) in the 
expression of Epy{R) as shown in Observation |4] to obtain that 

E*(R)= sup liminf lim --logPx"VF"|(a:",2/") G A-^xJ^": -zjl,^„(x";2/") <i? 

■ 
An information- spectral representation of E*(R) for the memoryless finite-input AWGN channel is thus 

established for all rates, although its solution in closed (single-letter) form is still a daunting task. 
We emphasize that the above finding also holds for any channel satisfying £(F") = 1 almost surely in 

Py^ as shown above; we hence have the following result (which directly follows from Theorem |2] along 

the same lines as the above proof). 

Corollary 1: Given a channel W, if for its input X uniform over any block codebook ^„, the following 
holds almost surely in Pyr^^ 

max Pynix"(v"'\x"') > max Pynij^nfy^lx") (39) 

for each n = 1,2,- ■ ■ , where eMiiv^) = argmaxa-ng^ Pyn| Yi(y"|x") is the maximum likelihood estimate 
of the transmitted codeword from the received channel output y", then the channel reliability function of 
W is given by 

E*{R) = Ppv(P) 

= sup liminf lim --logP^nw." |(a;",y") G A"" x 3^" : -j''P„wn{x'';y") < r] 
xeQ(R) "-^°° ^-5>°o n I n J 

for any < P < C, where C is the channel's capacity. 

Furthermore, if the channel satisfies the invariance conditions (l35l) or (l37l) . then Jx"iy"(^"j ?/") — 

i)^ijy„(x"; y"), which is the information density for the auxiliary channel with transition distribution 
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Pynixr^ (i.e., the tilted distribution of the original channel distribution Pyn|xn). In this case the channel 
reliability function becomes 

E*{R) = Ep^{R) 

= sup liminf lim --log Px«w^ ((a:",!/") G A"" x 3^" : -i^^U^a;"; 2/") < r\ 
X£Q{R) "~^°° ^-^«= n [^ n J 

for any < R < C. 

Observation 6: Corollary [T] requires condition (|39| ) to be valid for any block codebook ^„ and for 
each n = 1,2,- ■ ■ . One can immediately weaken the condition by considering only sufficiently large n; 
but without further knowledge on the optimal codebook (equivalently, the optimal channel input X that 
achieves Epy{R)), it may be hard to derive an alternative condition for (|39| ) that holds unanimously for 
any codebook. In particular, for discrete memoryless channels (DMC) with finite or countably infinite 
output alphabets, a codebook that fails condition (|39l) can always be constructed except if the channels 
are not noiseless (i.e., perfect)0 Hence, in its current form. Corollary \T\is not useful for discrete-output 
channels; instead, it is of interest for continuous-output channels. 

Observation 7: In light of the above observation, we further consider channels with continuous-output 
alphabets. For a channel that admits a channel transition pdf, the proof of Theorem |4] actually indicates 
that as long as -Pyi{3^(^n)} = for any block codebook ^„, where 

y{rC^) ^ {y" G M" : /yn|xn(y1x") = /yn|xn(y1x") for some a;",x" G ^„ and x^ ^ x"} , 

we have Pr[£(F") = 1] = 1 and (l39l) holds. We note that this is indeed valid for any sequence of transition 
pdf's for which the number of solutions in y„ satisfying 

/yn|Jfn(|/"|x") = /yn|X"(l/"|5") 

^ As a simple proof, note that for a noisy DMC there exist two inputs a, a' G X and an output b G y satisfying 
min{PY\x{b\a), PY\x{b\o,')} > 0. Then for a codebook '^„ consisting of two distinct codewords x" and x", where one of them is 
the permutation of the other, and their components are either a or a', we obtain 

"^^"'""(^ 1^ ) = PMr) = i^.|-iv.(r) = A^rir^ = "^^"1""^" 1^ ^ 

for the channel output y" satisfying jji ^ b for every 1 < i < n; hence, £(y") > 2 with Py^ {y") = iPy>i|x" iy"\x") + ^PY'^\x" (j/"!^") > 
0. This codebook therefore violates condition l |39| l. 

Notably, for a channel satisfying min{Py|x(6|a), Py|x(fc|"')} ~ fo'' every unequal a,a' £ X and b £ y, the error rate is zero for any 
codebook rQ^. So, only under such a noiseless situation can the finite- or countable-output DMC meet the strict requirement that £(Y") = 1 
with probability one for any codebook rQ^. 
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for given codewords x"', a;" in ^„ and given y"^^ is either finite or countable (as this condition immediately 
implies that 3^(^„) has Lebesgue measure zero). A large class of channels satisfy this condition. For 
example, channels with memoryless additive noise, where the noise pdf is not uniform or piecewise- 
uniform, satisfy this condition and hence (l39l) and Corollary [Tl This allows for most standard continuous 
distributions for the noise, such as the generalized-Gaussian distribution with shape parameter c > (e.g., 
cf. [11]); this distribution includes the Gaussian and Laplacian distributions as special cases, realized for 
c = 2 and c = 1, respectively. 

D. Examples of Channels for which the Epy{R) Bound Is Not Tight 

As already mentioned, the (analytical or numerical) computation of both upper bounds, Ep^ (R) and 
Epy{R), to the channel reliability function, given in (|32l ) and (|33] ). respectively, is formidable since they 
involve a difficult supremum operation of input processes in Q{R) in addition to the limit operations. 

(0) 

We can however lower-bound Epy{R), for a given (fixed) 9, using an auxiliary class of i.i.d. inputs 

(0) 

and compare this lower bound to Ep^ (R) with familiar channel reliability function upper bounds (such 
as the sphere-packing upper bound). If the former is shown to be strictly larger than the latter for a range 
of rates, then this indicates that for that particular 6, Epy{R) is not tight. The lower bound to Epy{R), 
which we denote by F{R,6), is derived in Appendix B and given in (l43l) for the case of memoryless 
channels. We herein calculate F{R,6) numerically to demonstrate that Epy{R) is not tight within a rate 
range and for certain choices of 9 (including 9 = 1 which gives the Poor-Verdu bound of (l28l)): this is 
shown for two standard binary-input memoryless channels: the BSC and the Z-channel. 

1) Memoryless BSC: For the BSC with crossover probability e, setting p = -Px(l) and s = j^ in (|43] ) 
yields 

49 (^) > ^iR^o) 



sup < I 1 ] R~ iiif log 

0<s<l I V ■5/ p:h^[p)>R 



l+6»-6»/s I „pl+6»-6»/s 



p){l-eY+''~''i'+pe 



[{l-p){l-ey + pe<^f^^''^ 



[{l-p)e'+p{l-eyf-^''^ 
for reals 9 > I and < i? < C = log(2) — /ib(£^), where C is the channel capacity and h\,{e) = 

— eloge — (1 — e) log(l — e) is the binary entropy function. 

We compare F{R,9) with the sphere packing upper bound to the BSC's reliability function (e.g., [9], 

[5]), which is denoted by -Espl-R) and given by 

E,,iR) = sup \(l--){R- log 2) - - log [(1 - ey + e^]] 

0<s<l l\S/ S ) 
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for < i? < C. In Fig. HI we plot ^sp(^) and F{R, 6) for ^ = 1 and 2 and e = 0.01. The figure indicates 
that for = 1, F{R, 9) > E^p{R) for all rates R. This directly implies that 

EMR) = 4y^'\R) > F{R,e) > E,,{R) 

for all < i? < C. Now recall that the sphere-packing upper bound Esp{R) is loose at low rates (for rates 
R less than the critical rate [9]) and tight (i.e., exactly equal to the channel reliability function E*{R)) 
at high rates (rates between the critical rate and capacity). Thus for the BSC, the Poor-Verdu bound of 
(|28] ) is not tight for all rates. Furthermore, note from the figure that since F(R,9) < E^p(R) for 9 = 2, 
we cannot make a conclusion regarding the tightness of E^y{R) in this case (this is also observed for 
9>2). 



F{R,1) 
E,p{R) 
F{R,2) 




Fig. 4. BSC with crossover probability e — 0.01: lower bound F{R,6) to Epy (R) for 6 = 1,2 and the sphere packing bound Eap{R) 



2) Memoryless Z-Channel: We next consider the memoryless binary Z-channel described by Py |x (0 1 1) 
e and Py|x(0|0) = 1. Again, setting p = Px(l) and s = j^ in (gS]) yields 

E^^iR) > F{R,9) 



sup 

0<s<l 



1 - - P - inf log 

S J p:K{p)>R 



[1 — p + pe^ 



,1-1/. 



+ p'/'{l 



for 6* > 1 and < P < C = log fl + (1 —e)e'^-^ j. Furthermore, the channel's sphere packing upper 
bound is given by 

Esp(P) = sup |(l--)P- inf \og\{l-p + pe')^^'+p^/'{l-e) 
o<s<i I V -5/ o<p<i L 
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for < i? < C. In Fig. |5l we plot ^sp(^) and F{R, 6) for 6 = 1, 3, 10, 100 and e = 0.01. We remark from 
the figure that for all considered values of 9 (including 9 very large not shown herein), F(R, 9) > -E'sp(-R) 
for high rates. This leads us to conclude that for the Z-channel, bound Epy{R) of (|32|) is not tight at high 
rates even when 9 approaches infinity. 



F{R,1) ^ 
F{R,3) ■--.- 
F{R,10) -s- 
F{R, 100) *- 

Esp{R) ^ 




0.6 C log(2) 



Fig. 5. Z-channel with crossover probabihty e — 0.01: lower bound F{R,9) to Epy'{R) for = 1,3, 10, 100 and the sphere packing 
bound Ebp{R). 

Observation 8: It should be emphasized that the above numerical examples regarding the looseness of 
-E'pv(-R) within a rate region and for given values of 9 do not shed any light on the tightness of Epy{R) 
given in (l33l) . since the expression of Epy{R) requires taking the limit with respect to 9 before taking the 
limit with respect to the blocklength n. 



V. Conclusion 

In this work, we generalized the Poor-Verdu lower bound for the multihypothesis testing error probabil- 
ity. The new bound, which involves the tilted posterior distribution of the hypothesis given the observation 
with tilting parameter 9, reduces to the original Poor-Verdu bound when 9 = 1. We established a sufficient 
condition under which the bound (without its multiplicative factor) provides the exact error probability 
when ^ — 7- oo. We also provided some examples to illustrate the tightness of the bound in terms of 9. 

We next applied the new bound to obtain two new upper information- spectrum based bounds to the 
reliability function of general channels with memory, Epy{R) and Epy{R), given in (l32l) and (l33l) . 
respectively. It was shown that Epy(R) is tight at all rates (below channel capacity) for a class of 
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channels that include the finite-input memoryless Gaussian channel, hence providing an information- 
spectral characterization for these channels' reliability function. The determination of Epy(R) in closed 
form and its calculation remains a challenging problem (specially at low rates) as it involves taking the limit 
with respect to 9 followed by optimizing the resulting large-deviation rate function over a constrained 
set of input processes (see (|33])). It is anticipated that i.i.d. channel inputs are unlikely to be a valid 
optimizer for Epy{R). Although the evaluation of Epy(R) for non-i.i.d. channel inputs appears difficult, 
the judicious use of Markovian inputs might be worthwhile investigating in the future. 

Appendix A 
Lemma 1: If the limit (in n) of a„ ., exists for every j = 1, 2, 3, . . ., then 

oo oo 

lim > On 7 = / lim «« j ■ 

n-s-oo ^ — ' ^ — ' n-5>oo 

Proof: Since for any sequences {hn} and {cn}. 



lim inf (6„ + c„) > lim inf hn + lim inf c„ 

n— >oo n— >oo n— >oo 

we recursively have that 

oo oo 

lim inf > an,j > lim inf a„ i + lim inf > a„ ,, 

n— ^-oo ' ^ ' n— ^oo ' n— >oo ' ^ 



> lim inf a„.i + lim inf a„^2 + lim inf N a„. 

n— >oo ' n— >oo ' n— >oo ' ^ 

i=3 

> ■■■ 

oo 

> y lim inf a„ J. 



j=i 



Similarly, since 



we obtain that 



Since 



we have 



lim sup (bn + c„, ) < lim sup 6„ + lim sup c„ 



n— >oo 



lim sup y^ o,n,j < 2_^ ^™ ^^P '^"'i' 

n— >oo . -, . -, n— >oo 



lim sup a„^j = lim inf a„ ,,■ = lim Qnj for every j, 



Elim a„ ,■ > lim sup > cin i ^ lim inf > a.„ , > > lim a„ ,, 
which immediately yields the desired result. 



n— ^00 n— >oo n— >oo ^— ' ^— ' n— >oo 
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Appendix B 

We derive a lower bound to Epy{R) given in (|32|) . which can be numerically evaluated for different 
values of 9 when the channel is memory less. 

Consider a general channel W = {H^"},^^ with finite input alphabet X and arbitrary output alphabet 
y. Fix R > 0. Given an i.i.d. process X = {X"}^]^ with alphabet X and entropy H{X) > R and a 
constant < 5 < H{X) — R arbitrarily small, define the (weakly) 5-typical set as: 

1 






logP;f„(x")-if(X) 
n 



<6 



--f2logPjc{x,)-H{X) <4 

^tr J 



We now recall the consequence of the Asymptotic Equipartition Property for i.i.d. (memoryless) sources 
(e.g., see [5], [7]). 

Proposition 2: Given an i.i.d. source {Xn}'^=i with entropy H{X) and any 6 greater than zero, then 
its 5-typical set J>j(5|X) satisfies the following. 

1) If a;" e Tn{5\X), then e-"(^(^)+^) < Pjfn(x") < e-"(^(^)-'^). 

2) Pxn (J^^(5|X)) < 6 for sufficiently large n, where the superscript "c" denotes the complement set 
operation. 

3) \TniS\X)\ > (1 - 5)e"(-^(^)-'^) for sufficiently large n, and |J'„((5|X)| < e"(^(^)+^) for every n, 
where |J-'„(5|X)| denotes the number of elements in J^„(5|X). 

Let X = {X"},^^ be a process that is uniformly distributed over J>i(5|X) for each n; i.e., Pj^„{x"') = 
,jr/lx)i for x"' G J>j(5|X) and n = 1, 2, ■ ■ ■ . From Proposition [2l we also obtain that for n sufficiently 
large and x" G j:„(5|X), 

(1 - 5)e-'-' < PM^niMSm = /^ < e^"^ (40) 

For X to belong to the set Q{R) as defined in (|29l) , it is required that 

liminf-log|5(X")| = liminf - log |J^(5|X)| > R. (41) 

n— >oo 77, n— >oo 77 

But condition (|4TI) can be guaranteed by setting H(X) > R and taking 6 < H{X) — R (as already 
assumed) since 

lim inf - log \J^{5\X)\> lim inf - log(l - 5)e''^^^^^~^'^ = H{X) - 5 > R, 
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where the inequality follows from property [T] of Proposition [21 Hence, such {X"'}'^^^ process, uniformly 
distributed over its support, belongs to Q{R). Thus, we can lower-bound Epy{R) for channel W = 
jp^njcxD^^ and a given 6* > 1 as follows 

49(i?) ^ sup \immi--\ogPxr^w4ix^,yneX^xy^:-jP„^„{x'';y-)<R 
xeQ(R) "^°° n [ n 

> liminf-ilogP^„^J(x",y") eX^xy^: ^jf (x^-y^) < r\. 
For n sufficiently large, we can write 

= log 



= log 
= log 
> lo: 



pd (''(/"l-r"'! 






pS ('7;"l'r'^'l 



1-5 Z^£"GJ"„{5|X) -Px"(3^"')-Pyn|x"(y"F"') 
> loe 



;i_5)e-2n<5p^„l^„(y"|x-) 



= log(l-5)-2n5 + jfi^^„(x";y"), 
where the first inequality follows from the lower bound in (|40|) . Accordingly, 

e'S{R) > l\mmi--\ogPj^„^J{x\yneX-xy--.-jf {x"-yn<R 

71— >co 77, I ji ^ vv 

> liminf--logP^„^„((x",2/") eX^xy^: ilog(l -,5) - 25 
rn>oo n I n 

= liminf--logP^„^„((x",2/") eX-xy-: -j£^„(x";y") 



<R--\og{l-5) + 25\. (42) 

n 



Observe that 



2n5 
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where the inequality follows from (|40l) . Then, we can further lower-bound the right-hand side term of 
(|42l) to obtain 



1 / 2nS r 

E^^{R) > liminf — log Pj^„^J{x^,y^)eX^xy^: 



-j|!vK"(^" yn<R-- log(l -S) + 26 
n ^ ^^ n 



(e) 



> liminf — log P;f„H/" (a^",y") eX-xy-: -jI> {x'^-y-) <R + ^K-25, 
n-5-oo n y n j 

where it suffices to take 7 > 25 to have 7 > — - log(l — 5) + 25 for n sufficiently large. 

In summary, we have shown that for any channel W = {VT"}^^, the upper bound Epy{R) to its 

channel reliability function satisfies 

E^^iR) > \immi--\ogPx^wJ{x'',y'')eX'^xy-:-jf„^„{x-;y^)<R + ^]-25 

for > 1 and any i.i.d. input process X with 

H{X) > R 

0<6 < H{X) - R 

7> 25. 

We next specialize the above lower bound for the case when channel W is memoryless. For a memoryless 
channel with an i.i.d. input, we have for p < 0, 



Px-w- i (^", y") ex^xy-: -j|;:^^„(x"; y") < i? + 7 



P^„VF"<(^^?/")eA'"xr:p^log. 



-' y|xv2/*F«) 



< e 



-p(/?+7) 



-p{iJ+7) 



-p{R+j) 



E.'G;t^x(-')4|j,{j/l-') 



xexyey \2^x'ex-t^x{x )^Y\xyy\^ ) 



- > np{R + 7) 
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where the inequality follows from Markov's inequality. Thus, for p < 0, we have 



n— s>oo n 



1 



45(i?) > liminf-^logP;f„^„<|(x",i/")GA'"x3^":^jfi^„(x";y")<i? + 7[^-25 



1 



> liminf log ( e'^^^^^^ 

n^foo n 



> liminf | p{R + 7) — lo^ 



Pytfivl^) 






EE^^(-) 



Pytfiyl^) 



p{R + 7) - log 



EE^^i 



xexyey \^c.'ex Px{x')PY\xiyW 

Pytfivl^) 




25 



-25 



26. 



xdXyay \Y.x'&yPx{x')P^\j^{y\x') 

Since p < 0, 7 should be made as small as possible. But as 7 > 25, it should thus approach 25 to obtain 



4v(i?) > pi? -log 



x&xyey \Y.x'&yPx{x')P^\x{y\x', 

id), 



2(1 - p)5. 



Taking 5 | yield the following lower bound to Ep^ (R) for a memoryless channel 



-Epv (-R) > sup sup ^ pi? — log 

P^:ii'(X)>i?P<0 



^S'(?/l^) 



y|x 






F(i?,e) 



(43) 



for e> 1. 



References 

[1] F. Alajaji, P.-N. Chen and Z. Rached, "A note on the Poor-Verdu upper bound for the channel rehabihty function," IEEE Trans. Inform. 

Theory, vol. 48, no. 1, pp. 309-313, Jan. 2002. 
[2] A. Barg and A. McGregor, "Distance distribution of binary codes and the error probability of decoding," IEEE Trans. Inform. Theory, 

vol. 51, pp. 4237-4246, Dec. 2005. 
[3] Y. Ben Haim and S. Litsyn, "Improved upper bounds on the reliability function of the Gaussian channel," IEEE Trans. Inform. Theory, 

vol. 54., no. 1, pp. 5-12, Jan. 2008. 
[4] P. Billingsley, Probability and Measure, Second Edition, Wiley, NY, 1986. 
[5] R. Blahut, Principles and Practice of Information Theory, Addison Wesley, MA, 1988. 
[6] J. A. Bucklew, Large Deviation Techniques in Decision, Simulation, and Estimation, Wiley, NY, 1990. 
[7] T. M. Cover and J.A. Thomas, Elements of Information Theory, New York: Wiley, 2nd Ed., 2006. 

[8] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, NY, 1981. 
[9] R. G. Gallager, Information Theory and Reliable Communication, Wiley, NY, 1968. 
[10] T. S. Han, Information-Spectrum Methods in Information Theory, Springer, 2003. 



33 

[11] J. H. Miller and J. B. Thomas, "Detectors for discrete-time signals in non-Gaussian noise," IEEE Trans. Inform. Theory, vol. 18, no. 2, 

pp. 241-250, Man 1972. 
[12] H. V. Poor and S. Verdii, "A lower bound on the probability of error in multi-hypothesis testing," IEEE Trans. Inform. Theory, vol. 41, 

no. 6, pp. 1992-1994, Nov. 1995. 
[13] C. E. Shannon, "Certain results in coding theory for noisy channels," Inform. Contr., vol. 1, pp. 6-25, Sep. 1957. 
[14] S. Verdii and T. S. Han, "A general formula for channel capacity," IEEE Trans. Inform. Theory, vol. 40, no. 4, pp. 1147-1157, July 

1994. 
[15] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding, McGraw-Hill, NY, 1979. 



